Efficient Reinforcement Learning in Factored MDPs

نویسندگان

  • Michael Kearns
  • Daphne Koller
چکیده

We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) of the DBN. Unlike the original E3 algorithm, our new algorithm exploits the DBN structure to achieve a running time that scales polynomially in the number of parameters of the DBN, which may be exponentially smaller than the number of global states.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Structure Learning in Factored-State MDPs

We consider the problem of reinforcement learning in factored-state MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns th...

متن کامل

TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs

Reinforcement learning is one of the main adaptive mechanisms that is both well documented in animal behaviour and giving rise to computational studies in animats and robots. In this paper, we present TeXDYNA, an algorithm designed to solve large reinforcement learning problems with unknown structure by integrating hierarchical abstraction techniques of Hierarchical Reinforcement Learning and f...

متن کامل

Efficient Abstraction Selection in Reinforcement Learning

This paper introduces a novel approach for abstraction selection in reinforcement learning problems modelled as factored Markov decision processes (MDPs), for which a state is described via a set of state components. In abstraction selection, an agent must choose an abstraction from a set of candidate abstractions, each build up from a different combination ofions, each build up from a differen...

متن کامل

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

sdyna is a general framework designed to address large stochastic reinforcement learning (rl) problems. Unlike previous model-based methods in Factored mdps (fmdps), it incrementally learns the structure of a rl problem using supervised learning techniques. spiti is an instantiation of sdyna that uses decision trees as factored representations. First, we show that, in structured rl problems, sp...

متن کامل

Sample Efficient Feature Selection for Factored MDPs

In reinforcement learning, the state of the real world is often represented by feature vectors. However, not all of the features may be pertinent for solving the current task. We propose Feature Selection Explore and Exploit (FS-EE), an algorithm that automatically selects the necessary features while learning a Factored Markov Decision Process, and prove that under mild assumptions, its sample...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999